Bilingual and Cross Domain Politics Analysis
نویسندگان
چکیده
Opinion mining on Twitter recently attracted research interest in politics using Information Retrieval (IR) and Natural Language Processing (NLP). However, getting domain-specific annotated data still remains a costly manual step. In addition, the amount and quality of these annotation may be critical regarding the performance of machine learning (ML) based systems. An alternative solution is to use cross-language and cross-domain sets to simulate training data. This paper describe a ML approach to automatically annotate Spanish tweets dealing with the online-reputation of politicians. Our main finding is that a simple statistical NLP classifier without in-domain training can provide as reliable annotation as humans annotators and outperform more specific resources such as lexicon or in-domain data.
منابع مشابه
From Bilingual Dictionaries to Interlingual Document Representations
Mapping documents into an interlingual representation can help bridge the language barrier of a cross-lingual corpus. Previous approaches use aligned documents as training data to learn an interlingual representation, making them sensitive to the domain of the training data. In this paper, we learn an interlingual representation in an unsupervised manner using only a bilingual dictionary. We fi...
متن کاملCross-Lingual Sentiment Classification with Bilingual Document Representation Learning
Cross-lingual sentiment classification aims to adapt the sentiment resource in a resource-rich language to a resource-poor language. In this study, we propose a representation learning approach which simultaneously learns vector representations for the texts in both the source and the target languages. Different from previous research which only gets bilingual word embedding, our Bilingual Docu...
متن کاملDisambiguation of Compound Noun Translations Extracted from Bilingual Comparable Corpora
Bilingual machine readable dictionaries are important and indispensable information resources for cross-language information retrieval, machine translation, and so on. In this paper, we describe a bilingual dictionary acquisition system which extracts translations from non-parallel but comparable corpora of a specific academic domain and disambiguates the extracted translations. We also experim...
متن کاملAutomated Alignment and Extraction of a Bilingual Ontology for Cross-Language Domain-Specific Applications
This paper presents a novel approach to ontology alignment and domain ontology extraction from two existing knowledge bases: WordNet and HowNet. These two knowledge bases are automatically aligned to construct a bilingual ontology based on the co-occurrence of words in a bilingual parallel corpus. The bilingual ontology achieves greater structural and semantic information coverage from these tw...
متن کاملAutomatic Parallel Corpora and Bilingual Terminology extraction from Parallel WebSites
In our days, the notion, the importance and the significance of parallel corpora is so big that needs no special introduction. Unfortunately, public available parallel corpora is somewhat limited in range. There are big corpora about politics or legislation, about medicine and other specific areas, but we miss corpora for other different areas. Currently there is a huge investment on using the ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Research in Computing Science
دوره 85 شماره
صفحات -
تاریخ انتشار 2014